AITopics | text recognition

Collaborating Authors

text recognition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

M5HisDoc: ALarge-scale Multi-style Chinese Historical Document Analysis Benchmark

Neural Information Processing SystemsMay-1-2026, 05:07:40 GMT

Recognizing and organizing text in correct reading order plays a crucial role in historical document analysis and preservation. While existing methods have shown promising performance, they often struggle with challenges such as diverse layouts, low image quality, style variations, and distortions. This is primarily due to the lack of consideration for these issues in the current benchmarks, which hinders the development and evaluation of historical document analysis and recognition (HDAR) methods in complex real-world scenarios. To address this gap, this paper introduces a complex multi-style Chinese historical document analysis benchmark, named M5HisDoc. The M5 indicates five properties of style, ie., Multiple layouts, Multiple document types, Multiple calligraphy styles, Multiple backgrounds, and Multiple challenges.

machine learning, pattern recognition, recognition, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.31)

Add feedback

Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data

Xinghua Lou, Ken Kansky, Wolfgang Lehrach, CC Laan, Bhaskara Marthi, D. Phoenix, Dileep George

Neural Information Processing SystemsMar-23-2026, 03:38:56 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, pattern recognition, recognition, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Text Recognition (0.43)

Add feedback

UNIT: Unifying Image and Text Recognition in One Vision Encoder

Neural Information Processing SystemsMar-22-2026, 16:59:50 GMT

Currently, vision encoder models like Vision Transformers (ViTs) typically excel at image recognition tasks but cannot simultaneously support text recognition like human visual recognition. To address this limitation, we propose UNIT, a novel training framework aimed at UNifying Image and Text recognition within a single model. Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities. The training process comprises two stages: intra-scale pretraining and inter-scale finetuning. During intra-scale pretraining, UNIT learns unified representations from multi-scale inputs, where images and documents are at their commonly used resolution, to enable fundamental recognition capability. In the inter-scale finetuning stage, the model introduces scale-exchanged data, featuring images and documents at resolutions different from the most commonly used ones, to enhance its scale robustness. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment. Experiments across multiple benchmarks confirm that our method significantly outperforms existing methods on document-related tasks (e.g., OCR and DocQA) while maintaining the performances on natural images, demonstrating its ability to substantially enhance text recognition without compromising its core image recognition capabilities.

artificial intelligence, machine learning, pattern recognition, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

dcfb4b195fe1ec750ef13312974b620e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 09:28:20 GMT

large language model, machine learning, resolution, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom (0.04)
Europe > Middle East (0.04)
Europe > Eastern Europe (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

HisDoc: A Large-scale Multi-style Chinese Historical Document Analysis Benchmark

Neural Information Processing SystemsFeb-18-2026, 00:59:42 GMT

Historical documents are invaluable carriers of human cultural heritage, containing important information about human history, culture, and literary arts.

machine learning, pattern recognition, recognition, (19 more...)

Neural Information Processing Systems

Country:

Africa > Cameroon > Far North Region > Maroua (0.04)
Asia > Japan (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
(2 more...)

Add feedback

be6122d1c7dec017c2bb1b168637f146-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 21:22:04 GMT

machine learning, natural language, recognition, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

4acb23d5d9b4bea566799afac0ee3125-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 14:15:55 GMT

machine learning, natural language, recognition, (19 more...)

Neural Information Processing Systems

Country:

Asia > Indonesia > Bali (0.04)
Europe > France (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

2bda52aca6d214904eceffbce50f2e8c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 04:05:10 GMT

dataset, typographic attack dataset, visual feature, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangxi Province (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

634841a6831464b64c072c8510c7f35c-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 16:05:52 GMT

adashare, domainnet, experiment, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Nonesung, Surapon, Jaknamon, Teetouch, Chaiophat, Sirinya, Nitarach, Natapong, Wittayasakpan, Chanakan, Sirichotedumrong, Warit, Na-Thalang, Adisai, Pipatanakul, Kunat

arXiv.org Artificial IntelligenceDec-5-2025

We present ThaiOCRBench, the first comprehensive benchmark for evaluating vision-language models (VLMs) on Thai text-rich visual understanding tasks. Despite recent progress in multimodal modeling, existing benchmarks predominantly focus on high-resource languages, leaving Thai underrepresented, especially in tasks requiring document structure understanding. ThaiOCRBench addresses this gap by offering a diverse, human-annotated dataset comprising 2,808 samples across 13 task categories. We evaluate a wide range of state-of-the-art VLMs in a zero-shot setting, spanning both proprietary and open-source systems. Results show a significant performance gap, with proprietary models (e.g., Gemini 2.5 Pro) outperforming open-source counterparts. Notably, fine-grained text recognition and handwritten content extraction exhibit the steepest performance drops among open-source models. Through detailed error analysis, we identify key challenges such as language bias, structural mismatch, and hallucinated content. ThaiOCRBench provides a standardized framework for assessing VLMs in low-resource, script-complex settings, and provides actionable insights for improving Thai-language document understanding.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.04479

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback